Search CORE

34 research outputs found

Evaluation of deep neural networks for reduction of credit card fraud alerts

Author: San Miguel Carrasco Rafael
Sicilia Urbán Miguel Ángel
Publication venue: IEEE
Publication date: 23/09/2020
Field of study

Fraud detection systems support advanced detection techniques based on complex rules, statistical modelling and machine learning. However, alerts triggered by these systems still require expert judgement to either confirm a fraud case or discard a false positive. Reducing the number of false positives that fraud analysts investigate, by automating their detection with computer-assisted techniques, can lead to significant cost efficiencies. Alert reduction has been achieved with different techniques in related fields like intrusion detection. Furthermore, deep learning has been used to accomplish this task in other fields. In our paper, a set of deep neural networks have been tested to measure their ability to detect false positives, by processing alerts triggered by a fraud detection system. The performance achieved by each neural network setting is presented and discussed. The optimal setting allowed to capture 91.79% of total fraud cases with 35.16% less alerts. Obtained alert reduction rate would entail a significant reduction in cost of human labor, because alerts classified as false positives by the neural network wouldn't require human inspection

e_Buah - Biblioteca Digital de la Universidad de Alcalá

Integrating descriptions of knowledge management learning activities into large ontological structures: A case study

Author: García Barriocanal Elena
Lytras Miltiadis
Rodríguez González M. Elena
Sicilia Urbán Miguel Ángel
Publication venue: 'Elsevier BV'
Publication date: 25/04/2005
Field of study

Ontologies have been recognized as a fundamental infrastructure for advanced approaches to Knowledge Management (KM) automation, and the conceptual foundations for them have been discussed in some previous reports. Nonetheless, such conceptual structures should be properly integrated into existing ontological bases, for the practical purpose of providing the required support for the development of intelligent applications. Such applications should ideally integrate KM concepts into a framework of commonsense knowledge with clear computational semantics. In this paper, such an integration work is illustrated through a concrete case study, using the large OpenCyc knowledge base. Concretely, the main elements of the Holsapple & Joshi KM ontology and some existing work on e-learning ontologies are explicitly linked to OpenCyc definitions, providing a framework for the development of functionalities that use the built-in reasoning services of OpenCyc in KM ctivities. The integration can be used as the point of departure for the engineering of KM-oriented systems that account for a shared understanding of the discipline and rely on public semantics provided by one of the largest open knowledge bases available

The Oberta in open access

Comparing social media and Google to detect and predict severe epidemics

Author: García Barriocanal María Elena
Samaras Loukas
Sicilia Urbán Miguel Ángel
Publication venue: Springer Nature
Publication date: 16/03/2020
Field of study

Internet technologies have demonstrated their value for the early detection and prediction of epidemics. In diverse cases, electronic surveillance systems can be created by obtaining and analyzing on-line data, complementing other existing monitoring resources. This paper reports the feasibility of building such a system with search engine and social network data. Concretely, this study aims at gathering evidence on which kind of data source leads to better results. Data have been acquired from the Internet by means of a system which gathered real-time data for 23 weeks. Data on infuenza in Greece have been collected from Google and Twitter and they have been compared to infuenza data from the ofcial authority of Europe. The data were analyzed by using two models: the ARIMA model computed estimations based on weekly sums and a customized approximate model which uses daily sums. Results indicate that infuenza was successfully monitored during the test period. Google data show a high Pearson correlation and a relatively low Mean Absolute Percentage Error (R=0.933, MAPE=21.358). Twitter results are slightly better (R=0.943, MAPE=18.742). The alternative model is slightly worse than the ARIMA(X) (R=0.863, MAPE=22.614), but with a higher mean deviation (abs. mean dev: 5.99% vs 4.74%)

e_Buah - Biblioteca Digital de la Universidad de Alcalá

On the graph structure of the Web of Data

Author: García Barriocanal María Elena
Nogales Moyano Alberto
Sicilia Urbán Miguel Ángel
Publication venue: IGI Global
Publication date: 01/04/2018
Field of study

This article describes how the Web of Data has emerged as the realization of a machine readable web relying on the resource description framework language as a way to provide richer semantics to datasets. While the web of data is based on similar principles as the original web, being interlinked in the principal mechanism to relate information, the differences in the structure of the information is evident. Several studies have analysed the graph structure of the web, yielding important insights that were used in relevant applications. However, those findings cannot be transposed to the Web of Data, due to fundamental differences in the production, link creation and usage. This article reports on a study of the graph structure of the Web of Data using methods and techniques from similar studies for the Web. Results show that the Web of Data also complies with the theory of the bow-tie. Other characteristics are the low distance between nodes or the closeness and degree centrality are low. Regarding the datasets, the biggest one is Open Data Euskadi but the one with more connections to other datasets is Dbpedia.European Commissio

e_Buah - Biblioteca Digital de la Universidad de Alcalá

Detecting browser drive-by exploits in images using deep learning

Author: García Barriocanal María Elena
Iglesias Mateos María Patricia
Sicilia Urbán Miguel Ángel
Publication venue: MDPI
Publication date: 17/01/2023
Field of study

Steganography is the set of techniques aiming to hide information in messages as images. Recently, stenographic techniques have been combined with polyglot attacks to deliver exploits in Web browsers. Machine learning approaches have been proposed in previous works as a solution for detecting stenography in images, but the specifics of hiding exploit code have not been systematically addressed to date. This paper proposes the use of deep learning methods for such detection, accounting for the specifics of the situation in which the images and the malicious content are delivered using Spatial and Frequency Domain Steganography algorithms. The methods were evaluated by using benchmark image databases with collections of JavaScript exploits, for different density levels and steganographic techniques in images. A convolutional neural network was built to classify the infected images with a validation accuracy around 98.61% and a validation AUC score of 99.75%

e_Buah - Biblioteca Digital de la Universidad de Alcalá

Traceability for trustworthy AI: a review of models and tools

Author: García Barriocanal María Elena
Mora Cantallops Marçal
Sicilia Urbán Miguel Ángel
Sánchez Alonso Salvador
Publication venue: MDPI
Publication date: 04/05/2021
Field of study

Traceability is considered a key requirement for trustworthy artificial intelligence (AI), related to the need to maintain a complete account of the provenance of data, processes, and artifacts involved in the production of an AI model. Traceability in AI shares part of its scope with general purpose recommendations for provenance as W3C PROV, and it is also supported to different extents by specific tools used by practitioners as part of their efforts in making data analytic processes reproducible or repeatable. Here, we review relevant tools, practices, and data models for traceability in their connection to building AI models and systems. We also propose some minimal requirements to consider a model traceable according to the assessment list of the High-Level Expert Group on AI. Our review shows how, although a good number of reproducibility tools are available, a common approach is currently lacking, together with the need for shared semantics. Besides, we have detected that some tools have either not achieved full maturity, or are already falling into obsolescence or in a state of near abandonment by its developers, which might compromise the reproducibility of the research trusted to them

e_Buah - Biblioteca Digital de la Universidad de Alcalá

Evolution and prospects of the Comprehensive R Archive Network (CRAN) package ecosystem

Author: García Barriocanal María Elena
Mora Cantallops Marçal
Sicilia Urbán Miguel Ángel
Sánchez Alonso Salvador
Publication venue: Wiley
Publication date: 28/06/2020
Field of study

Free and open source software package ecosystems have existed for a long time, but such collaborative development practice has surged in recent years. One of the oldest and most popular package ecosystems is Comprehensive R Archive Network (CRAN), the repository of packages of the statistical language R, a popular statistical computing environment. CRAN stores a large number of packages that are updated regularly and depend on many other packages in a complex graph of relations. As the repository grows, its sustainability could be threatened by that complexity or nonuniform evolution of some packages. This paper provides an empirical analysis of the evolution of the CRAN repository in the last 20 years, considering the laws of software evolution and the effect of CRAN's policies on such development. Results show how the progress of CRAN is consistent with the laws of continuous growth and change and how there seems to be a relevant increase in complexity in recent years. Significant challenges are raising related to the scale and scope of software package managers and the services they provide; understanding how they change over time and what might endanger their sustainability are key factors for their future improvement, maintenance, policies, and, eventually, sustainability of the ecosystem

e_Buah - Biblioteca Digital de la Universidad de Alcalá

Predicting length of stay across hospital departments

Author: García Barriocanal Elena
Puentes Gutiérrez Jesús Manuel
Sicilia Urbán Miguel Ángel
Sánchez Alonso Salvador
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/03/2021
Field of study

The length of hospital stay and its implications have a significant economic and human impact. As a consequence, the prediction of that key parameter has been subject to previous research in recent years. Most previous work has analysed length of stay in particular hospital departments within specific study groups, which has resulted in successful prediction rates, but only occasionally reporting predictive patterns. In this work we report a predictive model for length of stay (LOS) together with a study of trends and patterns that support a better understanding on how LOS varies across different hospital departments and specialties. We also analyse in which hospital departments the prediction of LOS from patient data is more insightful. After estimating predictions rates, several patterns were found; those patterns allowed, for instance, to determine how to increase prediction accuracy in women admitted to the emergency room for enteritis problems. Overall, concerning these recognised patterns, the results are up to 21.61% better than the results with baseline machine learning algorithms in terms of error rate calculation, and up to 23.83% in terms of success rate in the number of predicted which is useful to guide the decision on where to focus attention in predicting LOS

e_Buah - Biblioteca Digital de la Universidad de Alcalá

Authority-based conversation tracking in Twitter: an unattended methodological approach

Author: García Barriocanal María Elena
Mora Cantallops Marçal
Sicilia Urbán Miguel Ángel
Sánchez Alonso Salvador
Publication venue: MDPI
Publication date: 08/05/2020
Field of study

Twitter is undoubtedly one of the most widely used data sources to analyze human communication. The literature is full of examples where Twitter is accessed, and data are downloaded as the previous step to a more in-depth analysis in a wide variety of knowledge areas. Unfortunately, the extraction of relevant information from the opinions that users freely express in Twitter is complicated, both because of the volume generated—more than 6000 tweets per second—and the difficulties related to filtering out only what is pertinent to our research. Inspired by the fact that a large part of users use Twitter to communicate or receive political information, we created a method that allows for the monitoring of a set of users (which we will call authorities) and the tracking of the information published by them about an event. Our approach consists of dynamically and automatically monitoring the hottest topics among all the conversations where the authorities are involved, and retrieving the tweets in connection with those topics, filtering other conversations out. Although our case study involves the method being applied to the political discussions held during the Spanish general, local, and European elections of April/May 2019, the method is equally applicable to many other contexts, such as sporting events, marketing campaigns, or health crises

e_Buah - Biblioteca Digital de la Universidad de Alcalá

Modeling Bacterial Species: Using Sequence Similarity with Clustering Techniques

Author: García Barriocanal María Elena
González García Lino
Mora Cantallops Marçal
Sicilia Urbán Miguel Ángel
Sánchez Alonso Salvador
Publication venue
Publication date: 13/04/2021
Field of study

Existing studies have challenged the current definition of named bacterial species, especially in the case of highly recombinogenic bacteria. This has led to considering the use of computational procedures to examine potential bacterial clusters that are not identified by species naming. This paper describes the use of sequence data obtained from MLST databases as input for a k-means algorithm extended to deal with housekeeping gene sequences as a metric of similarity for the clustering process. An implementation of the k-means algorithm has been developed based on an existing source code implementation, and it has been evaluated against MLST data. Results point out to potential bacterial clusters that are close to more than one different named species and thus may become candidates for alternative classifications accounting for genotypic information. The use of hierarchical clustering with sequence comparison as similarity metric has the potential to find clusters different from named species by using a more informed cluster formation strategy than a conventional nominal variant of the algorithm

e_Buah - Biblioteca Digital de la Universidad de Alcalá